A New Era In Cosmology 

ASP Conference Series, Vol. , 2002 

Tom Shanks and Nigel Metcalfe, eds. 



Studying Structure Formation with the Sloan Digital Sky 
Survey 



David H. Weinberg 

Department of Astronomy, Ohio State University, Columbus, OH 
43210, USA 

Abstract. I review some of the recent results from the SDSS related to 
galaxies and large scale structure, including: (1) discovery of coherent, 
unbound structures in the stellar halo of the Milky Way, (2) demonstra- 
tion that the Pal 5 globular cluster has tidal tails and that the Draco 
dwarf spheroidal does not, (3) precise measurement of the galaxy lumi- 
nosity function and its variation with galaxy surface brightness, color, and 
morphology, (4) detailed examination of the Fundamental Plane from a 
sample of 9000 early type galaxies, (5) measurement, via galaxy-galaxy 
lensing, of the extended dark matter distributions around galaxies and 
their variation with galaxy luminosity, morphology, and environment, 
(6) measurements of the galaxy angular power spectrum and of the spa- 
tial correlation function and pairwise velocity dispersion as a function of 
galaxy luminosity and color. I then turn to a more abstract discussion 
of what we can hope to learn, in the long run, from galaxy clustering in 
the SDSS and the 2dFGRS. The clustering of a galaxy sample depends 
on the mass function and clustering of the dark halo population, and on 
the Halo Occupation Distribution (HOD), which specifies the way that 
galaxies populate the halos. Hydrodynamic simulations and semi-analytic 
models of galaxy formation make similar predictions for the probability 
P(N\M) that a halo of virial mass M contains N galaxies of a specified 
type: a non- linear form of the mean occupation A r avg (M), sub-Poisson 
fluctuations about the mean in low mass halos, and a strong dependence 
of iV av g(M) on the age of a galaxy's stellar population. Different galaxy 
clustering statistics respond to different features of the HOD, making it 
possible to determine the HOD empirically given an assumed cosmologi- 
cal model. Furthermore, changes to Q m and/or the linear power spectrum 
produce changes in the halo population that would be difficult to mask 
by changing the HOD. Ultimately, we can hope to have our cake and eat 
it too, obtaining strong guidance to the physics of galaxy formation by 
deriving the HOD of different classes of galaxies, while simultaneously 
carrying out precision tests of cosmological models. 



1. The SDSS: Goals and Status 

The observational goals of the SDSS have remained stable since its early days: 
(1) u,g,r,i,z CCD imaging of 10 4 deg 2 in the North Galactic Cap, to a depth 
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~ 23 mag in the most sensitive bands, (2) a spectroscopic survey in this area of 
10 6 galaxies, 10 5 quasars, and an assortment of stars and other targets, and (3) 
imaging of three 2.5° x 90° stripes in the South Galactic Cap, with the equatorial 
stripe scanned repeatedly to allow variability studies and co-added imaging that 
goes ~ 2 mag deeper than a single scan. The normal spectroscopic program 
is carried out on all three stripes in the south, and additional spectroscopy in 
the equatorial stripe will provide deeper samples of quasars and galaxies and 
more comprehensive coverage of stellar targets. The survey is carried out using 
a dedicated 2.5-m telescope on Apache Point, New Mexico, equipped with a 
mosaic CCD camera (Gunn et al. 1998) and two fiber-fed double spectrographs 
that can obtain 640 spectra simultaneously (Uomoto et al., in preparation). A 
technical overview of the survey appears in York et al. (2000), and an updated 
but more focused technical summary appears in the paper by Stoughton et al. 
(2002) that describes the Early Data Release. The quasar survey is reviewed by 
Schneider et al. (these proceedings), and the quasar target selection is described 
by Richards et al. (2002). There are two samples in the galaxy redshift survey, 
a magnitude limited sample to r = 17.77 comprising 90% of the galaxy targets 
(Strauss et al. 2002), and a sparser, deeper sample of luminous red galaxies 
(Eisenstein et al. 2001). 

After a decade of preparatory work, the SDSS formally began operations on 
the auspicious date of April 1, 2000. It is planned to run until summer, 2005. The 
total area covered will depend on the weather between now and then, especially 
the amount of weather that satisfies the seeing and photometric requirements 
of the imaging survey. A reasonable guess is that the northern survey will 
cover ~ 70% of the original 10 4 deg 2 goal by summer, 2005. As of January, 
2002, the SDSS had obtained ~ 3200 square degrees of imaging and ~ 230, 000 
spectra (of which about 80% are galaxies), including both northern and southern 
observations. The quality of the spectra, which cover the wavelength range 
3800A to 9200A at resolution R ~ 1800, is spectacular. Redshift completeness 
for spectroscopically observed galaxies is over 99%, and for most galaxies the 
spectra yield stellar velocity dispersions and valuable diagnostics of the stellar 
populations. The scientific analyses to date are only scratching the surface of 
what the spectroscopic data allow. 

In June, 2001, the SDSS released 462 deg 2 of imaging data (~ 14 million de- 
tected objects) and 54, 000 follow-up spectra obtained during commissioning ob- 
servations and the first phases of the survey proper, as documented by Stoughton 
et al. (2002). In addition to providing data for the larger community, the Early 
Data Release is a training exercise for the SDSS collaboration. One lesson is that 
releasing data as complex as that in the SDSS (e.g., radial profiles plus over 80 
measured parameters for each photometric object, some of which are deblended, 
some imaged on more than one scan, some observed spectroscopically) in a use- 
ful way is very challenging, even within the collaboration itself. The First Data 
Release will take place in January, 2003, and subsequent releases will take place 
on a roughly annual basis (see http://www.sdss.org/science/index.html). 

This paper is based on two talks that I gave at the New Era in Cosmology 
conference. Section 2 reviews some of the recent SDSS results on galaxies and 
large scale structure; the topics mentioned are some of the ones that I have found 
interesting myself, and they by no means constitute a comprehensive list. All of 
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these results are published or available on astro-ph, so my summaries are brief 
and do not include figures. In Section 3, I discuss what we can hope to learn 
from studying galaxy clustering in the SDSS and the 2dFGRS, with focus on the 
Halo Occupation Distribution as a way of thinking about the relation between 
galaxies and dark matter. This discussion is based on collaborative work with 
Andreas Berlind, Zheng Zheng, Jeremy Tinker, and others. 

2. A Review of Recent Results 

2.1. Substructure in the Milky Way 

The first science results from the SDSS that really surprised me were the dis- 
coveries, made independently by Yanny et al. (2000) using A-colored stars and 
by Ivezic et al. (2000) using RR Lyrae candidates, of coherent, unbound struc- 
tures in the Milky Way's stellar halo, stretching across tens of degrees. The idea 
that the stellar halo might be built by mergers of dwarf galaxies is an old one 
(Searle & Zinn 1978), and much of the recent theoretical modeling has focused 
on detecting fossil substructure through phase space studies of the local stellar 
distribution (e.g., Johnston et al. 1995; Helmi & White 1999; Helmi & de Zeeuw 

2000) . Even in the absence of kinematic data, the SDSS is a powerful tool for 
detecting substructure in the outer halo because multi-color imaging allows the 
definition of samples of stars that are approximately standard candles. The two 
structures found by Yanny et al. (2000), in an area ~ 1% of the sky, may both 
be associated with the tidal stream of the Sagittarius dwarf galaxy (Ibata et al. 

2001) . However, a recent study using F-stars (Newberg et al. 2001) appears to 
show several more substructures, and no clear indication that there is a smooth 
underlying halo at all. Extending a model originally developed to investigate 
the dwarf satellite problem, Bullock, Kravtsov, & Weinberg (2001) showed that 
the population of disrupted dwarfs expected in the CDM cosmological scenario 
could naturally account for the entire stellar halo. If this model is right, then 
the SDSS should reveal ubiquitous substructure in the outer halo, where orbital 
times are long and the number of discrete streams is relatively small. In any 
event, the SDSS imaging survey will answer fundamental questions about the 
origin of the Milky Way's stellar halo, and perhaps about the amount of power 
on sub-galactic scales in the primordial fluctuation spectrum. 

2.2. Milky Way Satellites 

With multi-color data, one can define optimal filters to maximize the contrast 
between an object in the Milky Way or Local Group and the foreground and 
background stellar distributions. Odenkirchen et al. (2001ab) have developed 
this technique and applied it to great effect in studies of the globular cluster Pal 
5 and the dwarf spheroidal Draco. Pal 5 shows two well defined tidal tails that 
contain ~ 1/3 of the cluster's stars, demonstrating that the cluster is subject to 
heavy mass loss. The orientation of the tails reveals the projected direction of the 
cluster's orbit, and peaks within the tails may be a signature of disk shocking 
events. Draco, on the other hand, shows no sign of tidal extensions even at 
surface densities ~ 10~ 3 of the central value, demonstrating that it is a bound, 
equilibrium system, and justifying standard kinematic estimates that yield a 
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high mass-to-light ratio, M/Lj ~ 100 — 150. As the SDSS covers more sky, we 
will get a more complete census of which objects are being tidally destroyed and 
which are still holding themselves together. Tidal tails and tidal streams may 
also provide valuable constraints on the radial profile, shape, and dumpiness of 
the Milky Way's dark halo potential (see, e.g., Johnston et al. 1999). 

2.3. The Galaxy Luminosity Function 

Blanton et al. (2001) measured the galaxy luminosity function in u, g, r, i, and 
z using a sample of 11,275 galaxies observed during SDSS commissioning obser- 
vations. The large sample size, accurate photometry, and use of the Petrosian 
(1976) system for defining galaxy magnitudes yield small statistical errors and 
excellent control of systematic effects. This analysis confirms and quantifies pre- 
vious indications that the galaxy luminosity function varies systematically with 
surface brightness, color, and morphology; the first correlation implies that low 
surface brightness galaxies make only a small contribution to the mean lumi- 
nosity density of the universe. The measured luminosity density exceeds the 
.R-band estimate from the Las Campanas Redshift Survey (Lin et al. 1996) by a 
factor of two, and Blanton et al. show that this difference arises from the isopho- 
tal magnitude definition adopted by the LCRS, which misses light in the outer 
parts of galaxies that have intrinsically low or cosmologically dimmed surface 
brightness. 

Norberg et al. (2001b) demonstrate convincingly that the high-precision 
measurement of the 6j-band luminosity function from the 2dFGRS is in good 
agreement with the luminosity function measured by the SDSS. Two main fac- 
tors caused Blanton et al. (2001) to reach a contrary conclusion: they used an 
inaccurate conversion from SDSS bands to bj, and their maximum likelihood 
method effectively estimates the luminosity function at a redshift z ~ 0.1 — 0.15, 
while the method used by Norberg et al. (2001b) implicitly corrects for evo- 
lution to derive the z = luminosity function. Using the SDSS early release 
data, Norberg et al. (2001b) demonstrate excellent agreement in the mean be- 
tween SDSS and 2dFGRS galaxy magnitudes and redshifts, and they confirm 
earlier estimates of the completeness and stellar contamination of the 2dFGRS 
input catalog. While the optical luminosity function estimates are now in good 
agreement, there remains a puzzling discrepancy pointed out by Wright (2001) 
between the luminosity density found in the optical bands by the SDSS and the 
estimates of the K s -band luminosity density from the 2dFGRS and 2MASS data 
(Cole et al. 2001; see also Kochanek et al. 2001). Norberg et al. (2001b) argue 
that the discrepancy is probably dominated by large scale structure fluctuations 
in the area used to normalize the K s -band luminosity function. 

2.4. Properties of Early Type Galaxies 

The SDSS data are ideal for studying the correlations of galaxy properties, since 
the photometric and spectroscopic reduction pipelines measure many quantities 
automatically and one can create large samples with well understood selection 
effects. The main effort to date in this area is the comprehensive study of a sam- 
ple of 9000 early type galaxies by Bernardi et al. (2001). They determine the 
fundamental plane in the g, r, i, and z bands, and they measure bivariate cor- 
relations among luminosity, size, velocity dispersion, color, mass-to-light ratios, 
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and spectral indices. The large sample size and high precision allow examination 
of relatively subtle effects, such as a slight difference in the fundamental plane 
of "field" and "cluster" galaxies. The evolution of the fundamental plane over 
the redshift range of the sample (which extends to z 0.2) is consistent with 
passive evolution of old stellar populations. 

2.5. Galaxy- Galaxy Lensing 

One of the most dramatic breakthroughs from the SDSS has been the mea- 
surement of galaxy mass profiles and the galaxy-mass correlation function via 
galaxy-galaxy weak lensing. Systematic effects are much easier to control for 
galaxy-galaxy lensing than for cosmic shear measurements because image dis- 
tortion is measured perpendicular to the radial separation vector, which has a 
different orientation for each foreground-background pair. The large area gives 
the SDSS great statistical power despite its rather shallow imaging (by weak 
lensing standards). 

Fischer et al.'s (2000) analysis of two nights of SDSS imaging data (225 
deg 2 ) was at the time the clearest detection of a galaxy-galaxy lensing signal, 
with extended shear profiles (to r ;> 250h~ 1 kpc) offering direct evidence for the 
extended dark matter halos expected in standard models of galaxy formation. 
McKay et al. (2001) have since analyzed a sample of ~ 3, 600, 000 source galaxies 
around ~ 35, 000 foreground galaxies in the spectroscopic sample, measuring 
the galaxy-mass correlation function and its dependence on galaxy luminosity, 
morphology, and environment. They find that the mass within an aperture of 
260/i _1 kpc scales linearly with galaxy luminosity, that the excess mass density 
around galaxies in high density regions remains positive to r ~ l/i -1 Mpc while 
that around isolated galaxies is undetectable beyond ~ 300/i _1 kpc, and that 
early type galaxies have a higher amplitude galaxy-mass correlation function, 
in part because of their preferential location in group environments. Guzik & 
Seljak (2002) have modeled the McKay et al. results to infer that galaxies 
have virial masses M ~ (5 — 10) x 1O 11 /i _1 M0, implying that a large fraction 
of the baryons within the virial radius of an galaxy halo end up as stars 
in the central galaxy. By comparing to the Tully-Fisher relation, Seljak (2002) 
concludes that circular velocities at the halo virial radius are typically a factor 
~ 1.7 — 1.8 below the values measured at the galaxy optical radius, and in 
reasonably good agreement with predictions based on CDM halo profiles. 

2.6. Angular Galaxy Clustering 

Early efforts to study galaxy clustering with the SDSS have focused on the anal- 
ysis of a 2.5° x 90° stripe of imaging data that has been closely examined and 
reduced multiple times. Scranton et al. (2001) carried out an exhaustive analysis 
of possible systematic effects associated with seeing variations, stellar density, 
Galactic reddening, galaxy deblending, variations across the imaging camera, 
and so forth, showing that they have no significant impact on measurements 
of the angular correlation function at the obtainable level of statistical preci- 
sion. These experiments demonstrate that star-galaxy separation in the SDSS 
imaging works extremely well to r rs 22. Dodelson et al. (2001) modeled the 
measurements of the angular correlation function (Connolly et al. 2001) and the 
angular power spectrum (Tegmark et al. 2001), to infer the 3-dimensional clus- 
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tering of galaxies. Their results, as ~ 0.8 — 0.9, T ps 0.15, are consistent with 
those obtained by Szalay et al. (2001) applying a different method, Karhunen- 
Loeve parameter estimation, to the same galaxy catalog. The statistical error 
bars from these analyses are not yet competitive with the highest precision anal- 
yses of the galaxy power spectrum (e.g., Percival et al. 2001), but they provide 
reassuring evidence that any systematic biases in the SDSS imaging data, and 
thus in the input to the redshift survey, are well controlled. Recently Szapudi 
et al. (2001) have analyzed the higher order angular moments of this data set, 
finding agreement with the hierarchical scalings and values of skewness and kur- 
tosis parameters predicted by ACDM models that incorporate mild suppression 
of galaxies in high mass halos. With the analysis tools now developed and tested 
and the systematic issues apparently well understood, the analyses of larger sky 
areas should soon yield precise measurements of angular clustering over a wide 
dynamic range. 

2.7. Clustering in the Redshift Survey 

Zehavi et al. (2001) carried out the first analysis of clustering in the SDSS red- 
shift survey, focusing on the real space correlation function £(r) and the pairwise 
velocity dispersion 0-12(7*) for different classes of galaxies, with a sample similar 
in size and geometry to the LCRS. Galaxies in absolute magnitude bins centered 
on M* — 1.5, M*, and M* + 1.5 have parallel power-law correlation functions with 
slopes 7 f» —1.8, but their amplitudes are significantly different, with ro ps 4.7, 
6.3, and 7.4/i _1 Mpc, respectively. The correlation function of red galaxies is 
both steeper and higher amplitude than that of blue galaxies. The pairwise dis- 
persion for the full sample is a±2 ~ 600km s^ 1 at r ~ l/i -1 Mpc, but red galaxies 
have on ~ 700km s _1 and blue galaxies only o\2 ~ 400km s -1 . The dependence 
of £(r) on galaxy properties resembles that found by Norberg et al. (2001ac) in 
the 2dFGRS (and in earlier studies such as Guzzo et al. 1997), but there are 
significant differences of detail. The SDSS data show a steady trend of corre- 
lation strength with luminosity, while Norberg et al. (2001a) find a transition 
from weak dependence below L* to strong dependence above L*. Norberg et al. 
(2001c) find similar £(r) slopes for galaxies of different spectral types, while the 
correlation function of blue galaxies in the SDSS is clearly shallower than that 
of red galaxies. Analysis of larger SDSS samples should clarify the significance 
of these differences; the first could reflect the difference between r-band and 
6j-band selection, and the second could represent a difference between color and 
spectral type as a basis for galaxy classification. A consequence of the SDSS 
observing strategy (dictated largely by the instruments themselves) is that the 
early redshift data had a 2-dimensional slice geometry, making it difficult to 
study the large scale power spectrum and statistics that require contiguous 3-d 
volumes, like void probabilities and topology. That situation is changing as the 
survey progresses, and first results on these topics should emerge over the next 
several months. 

2.8. Random Optimistic Remarks 

The SDSS collaboration involves hundreds of scientists, with eleven participating 
institutions on three continents. With such a large and far-flung collaboration, 
we spend a lot of energy just keeping ourselves organized. I have just finished 



Studying Structure Formation with the SDSS 



7 



my term as the SDSS Scientific Publications Coordinator, a position with one 
chief benefit: I was forced to pay attention as the scientific output of the SDSS 
grew from a trickle to a flood, spreading rivulets into many different areas of 
astronomy. This development has been exciting to watch, and I have learned 
a lot of astronomy just by following it. While there are certainly challenges 
of communication in a collaboration this large, the process of going from data 
to science has worked, in my opinion, remarkably well. The ideal scenario is 
that each scientific analysis draws on the collective expertise of a very broad 
spectrum of astronomers; I have been delighted to see how often we approach 
this ideal in practice. The richness of the SDSS data is more than enough to 
keep us busy. Indeed, while I am sure that we look enormous from the outside, 
it is constantly evident from the inside that we don't have enough people to do 
all the science we would like to be doing. That, of course, is one of many good 
reasons for publishing the data. 

Before completely shifting gears, let me pause to congratulate the mem- 
bers of the 2dF galaxy and quasar redshift surveys for (a) obtaining more than 
200,000 spectra, (b) publishing more than 100,000 spectra, and (c) writing a 
number of beautiful papers analyzing the results and implications. All three of 
these are great achievements. While the SDSS and 2dF teams cannot help but 
see themselves in competition every now and then, the benefits to astronomy of 
having these independent data sets and independent analyses are already very 
clear. 



3. What Can We Learn From Galaxy Clustering? 

The above question is one has been pondered by many people over several 
decades. Two developments that color recent considerations of this subject 
are the extraordinary improvements in the quantity and quality of the redshift 
survey data and the convergence of the cosmological community on a "stan- 
dard" model, ACDM, that is supported by an impressive base of observational 
evidence. A third development that has deeply affected my own thinking is the 
emergence of a new way of describing galaxy bias, the Halo Occupation Distri- 
bution (HOD). The HOD characterizes the statistical relation between galaxies 
and mass in terms of the probability distribution P(N\M) that a halo of virial 
mass M contains N galaxies, together with prescriptions that specify the relative 
spatial and velocity distributions of galaxies and dark matter within these halos. 
Note that "halo" here refers to a structure of typical overdensity p/p ~ 200, in 
approximate dynamical equilibrium; higher density cores within a group or clus- 
ter are, in this description, treated as substructure, and characterized only in a 
statistical sense. Since different types of galaxies have different space densities 
and different clustering properties, a given HOD applies to a specific class of 
galaxies, e.g., red galaxies brighter than L*, or late- type spirals with r magni- 
tudes —17 to —19. The HOD framework has roots in early analytic models that 
described galaxy clustering as a superposition of randomly distributed clusters 
with specified profiles and a range of masses (Neyman & Scott 1952; Peebles 
1974; McClelland & Silk 1977). A bevy of recent papers have shown that, when 
combined with numerical or analytic models of the clustering of the halos them- 
selves, the HOD is a powerful tool for analytic and numerical calculations of 
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clustering statistics, for modeling observed clustering, and for characterizing the 
results of semi-analytic or numerical studies of galaxy formation (e.g., Jing, Mo, 
& Borner 1998; Benson et al. 2000; Ma & Fry 2000; Peacock & Smith 2000; 
Seljak 2000; Berlind & Weinberg 2001; Marinoni & Hudson 2001; Scoccimarro 
et al. 2001; Yoshikawa et al. 2001; White, Hernquist, & Springel 2001; Bullock, 
Wechsler, & Somerville 2002). 

My own interest in this approach was spurred largely by the paper of Ben- 
son et al. (2000), who discussed the clustering predictions of their semi-analytic 
model of galaxy formation in these terms. A forthcoming paper by Berlind et al. 
(in preparation; see also Berlind 2001) compares the predictions of the Benson 
et al. semi-analytic formalism to those of a large, smoothed particle hydrody- 
namics (SPH) simulation (Murali et al. 2001; Dave et al., in preparation), for 
the same cosmological model. The agreement between the two approaches is 
remarkably good. If we select galaxies above a specified baryon mass thresh- 
old, chosen separately in the two calculations so that the space densities of 
the populations are equal, then the mean occupation N avg (M) is a non-linear 
function of mass with three basic features: a cutoff mass below which halos 
are not massive enough to host a galaxy above the threshold, a low occupancy 
regime (-/V avg ^ 2) in which the mean occupation grows slowly with increasing 
halo mass but the average galaxy mass itself increases, and a high occupancy 
regime in which A r avg (M) grows more steeply with mass, though the growth is 
still sub-linear because larger, hotter halos convert a smaller fraction of their 
baryons into galaxies. In the low occupancy regime, the fluctuations about the 
mean are well below those of a Poisson distribution — a halo that is supposed 
to host one galaxy very rarely hosts two — and the sub- Poisson nature of these 
fluctuations has a crucial impact on some clustering statistics. The HOD is 
strongly dependent on the age of the galaxies' stellar populations; old galaxies 
like to live together in massive, high occupancy halos, while young galaxies stu- 
diously avoid them. The SPH simulations further show that the oldest, most 
massive galaxy in a halo usually resides near the halo center and moves at close 
to the center-of-mass velocity, while the remaining galaxies approximately trace 
the spatial and velocity distribution of the halo's dark matter. The agreement 
between the semi-analytic and SPH calculations, despite some clear differences 
in the way that they treat radiative cooling and feedback from star formation, 
suggests that the HOD emerges from fairly robust physics that both methods 
do right, given their common assumptions. Whether these assumptions hold in 
the real universe is, of course, one of the things we hope to learn. 

Figure 1 sketches the interplay between the "Cosmological Model" and the 
"Galaxy Formation Theory" in determining galaxy clustering (which I take to 
include the galaxy-mass correlations measured by weak lensing). The HOD ap- 
proach suggests a nice division of labor between these two theoretical inputs. The 
cosmological model, which specifies the initial conditions (e.g., scale- invariant 
fluctuations from inflation) and the matter and energy contents (e.g., Q m , f^, 
Q. v , fi T , Oa), determines the mass function, spatial correlations, and velocity 
correlations of the dark halo population. At our adopted overdensity threshold 
p/p ~ 200, these properties of the halo population are determined almost en- 
tirely by gravity, with no influence of complex gas physics. I have inserted a box 
in the path between cosmological model and dark halo population to indicate 
that the only features of the cosmological model that really matter in this con- 
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Figure 1. The interplay between the cosmological model and the 
galaxy formation theory in determining galaxy clustering and galaxy- 
mass correlations. 
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text are Q m , the fluctuation amplitude (represented here by erg), and the power 
spectrum shape (represented here by n and T, though it could, of course, be 
more interesting) . Other features of the cosmological model, such as the energy 
density and equation of state of the vacuum component, may have an important 
impact on other observables or on the history of matter clustering, but they 
have virtually no effect on the halo population at z = 0, if the shape of P(k) 
and the present day value of as are held fixed. The galaxy formation theory in- 
corporates the additional physical processes — such as shock heating, radiative 
cooling, conversion of cold gas into stars, and feedback of star formation on the 
surrounding gas — that are essential to producing distinct, dense, bound clumps 
of stars and cold gas. It further specifies what aspects of a galaxy's formation 
history determine its final mass, luminosity, diameter, color, morphology, and 
so forth. These physical processes operate in the background provided by the 
evolving halo population, so the predicted HOD depends on both the theory of 
galaxy formation and the assumed cosmological model. 

As a description of bias, the crowning virtue of the HOD is its complete- 
ness: given a dark halo population and a fully specified HOD, one can predict 
the value of any galaxy clustering statistic, on any scale, using analytic approx- 
imations and/or numerical simulations]^] Berlind & Weinberg (2001) examined 
the influence of the HOD on galaxy clustering and galaxy-mass correlations, for 
the halo population of a ACDM N-body simulation. We found that different 
clustering statistics, or even the same statistic at different scales, are sensitive 
to different aspects of the HOD. For example, at large scales Cgg( r ) 1S propor- 
tional to the mass correlation function £ mm (r), with a bias factor equal to the 
average of the halo bias factor bh{M) weighted by the halo number density and 
the mean occupation N scvg {M). On small scales, however, the explicit depen- 
dence on £ mm (r) disappears, and £gg( r ) depends on the halo mass function, on 
the mean number of pairs (N(N — 1)) as a function of halo mass and virial 
radius, and (to a lesser extent) on the internal bias between galaxy profiles and 
mass profiles. Connecting these pieces into a power-law galaxy correlation func- 
tion is a rather delicate balancing act, and the success of SPH simulations and 
semi-analytic models in reproducing the observed form of £g g (?*) given a ACDM 
cosmology is entirely non-trivial; the reduced efficiency of galaxy formation in 
high mass halos and the sub-Poisson fluctuations in low mass halos are both 
crucial to this success. Higher order correlation functions place greater weight 
on the high mass end of the halo population and on higher moments of P(N\M). 
The void probability function, on the other hand, depends mainly on the low 
mass cutoff of the HOD, which determines the probability of finding galaxies 
in the low mass halos that populate large scale underdensities. The pairwise 
velocity dispersion has distinct regimes much like £gg(r), but it depends little on 
the low mass cutoff and strongly on the relative occupation of high and low mass 
halos, and the sub-Poisson fluctuations that depress £gg( r ) at small scales boost 
the pairwise dispersion by forcing those pairs that do exist at these separations 



1 There is one caveat here, namely the implicit assumption that a halo's galaxy content depends, 
on average, only on its mass, and has no statistical correlation with the halo's large scale 
environment. This assumption is adopted in "merger tree" formulations of the semi-analytic 
method, and it is supported by the N-body experiments of Kauffmann & Lemson (1999) and 
by the results of the SPH simulation mentioned above, but it is not logically incontrovertible. 
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to come from higher mass halos. The pairwise dispersion can also be influenced 
by velocity bias of galaxies within halos. The group multiplicity function bears 
a quite direct relation to the HOD, to such an extent that one can "read off" 
iV av g(M) if P{N\M) is reasonably narrow and one assumes an underlying halo 
mass function n{M). Peacock k, Smith (2000) and Marinoni & Hudson (2001) 
have applied variants of this approach to observational data and obtained results 
that agree rather well with the SPH and semi- analytic predictions, assuming a 
ACDM halo mass function. 

Berlind and I concluded that an empirical determination of the HOD should 
be possible given high precision clustering measurements and the halo popula- 
tion of an assumed cosmological model. This, at a minimum, is what we can 
expect to learn from galaxy clustering: the halo occupation distributions of 
many different classes of galaxies, given a cosmological model motivated by in- 
dependent observations. Because the HOD description of bias is complete, these 
HODs encode everything that galaxy clustering has to teach us about galaxy 
formation. They encode it, moreover, in a physically informative way, allowing 
detailed tests of theoretical predictions and providing rather specific guidance 
when these predictions fail. If your theory of galaxy formation does almost ev- 
erything right but puts too many blue-ish SO galaxies in 10 13 — 1O 14 M halos, 
then you might have some ideas on how to fix it. 

Can we have our cake and eat it too? In more precise words, if we find 
a combination of cosmological model and HOD that matches all the galaxy 
clustering data, can we conclude that both are correct, or might there be other 
combinations that are equally successful? 

To decide whether cosmology and bias are degenerate with respect to galaxy 
clustering, we must first know how changing the cosmology alters the halo pop- 
ulation. This issue is the subject of a forthcoming paper by Zheng et al., where 
we investigate the effect of changing fl m on its own, of changing J7 m and <7g 
simultaneously while maintaining "cluster normalization" (crgfl^ 5 =constant), 
and of changing J7 m and a% in concert with n or T. The impact of a pure U m 
change is simple: the halo mass scale M* shifts in proportion to Q m , pairwise 
velocities (at fixed M/M*) are proportional to fi^' an d halo clustering at fixed 
MjM* is unchanged. Cluster normalized changes to Q m and as keep the space 
density of halos approximately constant near M ~ 5 x lO 14 /i _1 M , and halo 
clustering and pairwise velocities remain similar at fixed M. However, the shape 
of the halo mass function changes, with a decrease of tt m from 0.3 to 0.2 pro- 
ducing a ~ 30% drop in the number of low mass halos. One can preserve the 
shape of the mass function over a large dynamic range by changing n or T, but 
the required changes are substantial — e.g., masking a decrease of £l m from 0.3 
to 0.2 requires An ss 0.3 or Ar « 0.15. These changes to the power spectrum 
significantly alter the halo clustering and halo velocities. 

The sensitivity of the halo population to the cosmological model parameters 
is encouraging, because these changes cannot easily be masked by changing the 
HOD. For a pure Q m shift, one could keep the spatial clustering of galaxies the 
same by using the same HOD as a function of M/M*, but the change would 
be detected by any dynamically sensitive clustering statistic, like large scale 
redshift-space distortions, the pairwise velocity dispersion, the galaxy-mass cor- 
relation function, or direct measurements of group and cluster masses. Even ve- 
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locity bias within halos could not hide all of these changes. A cluster- normalized 
change to eg and Q m would require a change in galaxy occupation as a function 
of M/M* in order to maintain the galaxy space density and group multiplicity 
function, and this change would affect other measures of galaxy clustering. A 
simultaneous change to the power spectrum shape that preserved the halo mass 
function would change galaxy clustering by changing the clustering of the halos 
themselves. 

It remains to be seen just how well one can do quantitatively from realistic 
observations. The proof, ultimately, must await the pudding, but Zheng has be- 
gun to investigate the question in a somewhat idealized context. As a starting 
point, he takes clustering measures predicted by a ACDM cosmology with the 
HOD derived from the SPH simulations, calculated by a variety of analytic ap- 
proximations. He then changes the assumed cosmology, thus changing the halo 
mass function and halo clustering, and he allows the HOD to change as well, 
using a parametrized form that gives flexibility in all of the essential features. 
He finds the HOD that gives minimum x 2 for the original clustering "measure- 
ments," which are assumed to have 10% fractional uncertainties, and the value of 
A% 2 for the best-fit HOD indicates the acceptability of the cosmological model. 
The preliminary results from this exercise are encouraging. For example, in the 
case of pure £l m changes, the galaxy correlation function and group multiplicity 
function constrain the HOD tightly enough that measurements of (3 = Q^/b 
or the pairwise velocity dispersion impose useful constraints on Q m . As the 
SDSS and 2dFGRS measurements take shape, we can imagine taking a similar 
approach to the real data, albeit with careful attention to the accuracy of the 
clustering approximations. In terms of Figure 1, the surveys provide us with 
the entries in the lowest box, and using them, we search for maximum likeli- 
hood solutions for the parameters in the second boxes on the left and right hand 
sides. Despite what might at first appear to be a lot of freedom, the degeneracies 
appear to be limited, and we can hope to do rather well. 

Here, then, is my conjectural answer to the question posed in the section 
title: we can learn the HOD of different classes of galaxies, gaining physical 
insight into their origin, and we can separately determine £l m and the amplitude 
and shape of the linear theory power spectrum, from the largest scales probed by 
the surveys (where perturbation theory describes the dark matter and the HOD 
fixes the "bias factors" needed to connect galaxies to mass) down to moderately 
non-linear scales (below which information about the linear power spectrum may 
be effectively erased, at least as far as the halo mass function and halo clustering 
are concerned). We can also test for any departures from Gaussian primordial 
fluctuations. We get these cosmological constraints without relying on a detailed 
theory of galaxy formation, only on the basic tenet that the HOD formulation 
itself is valid. While we might be wary of relying on conclusions that involve 
complicated corrections for galaxy bias, the observed dependence of clustering 
on galaxy type allows powerful cross-checks. When we analyze different classes 
of galaxies, we should derive different HODs, but we should always reach the 
same conclusions about the underlying cosmological model. If we do, then we 
have good reason to think that we are doing things right. 

Given all the other methods that can constrain cosmology with tracers that 
are less physically complicated, one might wonder what galaxy clustering and 
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galaxy-mass correlations have to contribute to cosmological tests, beyond a re- 
assuring consistency check. After all, how important is the second decimal place 
on fi m ? While I hear variants of this question often, I think it is a red herring, 
and that we should be relentless in our efforts to squeeze as much cosmological 
information as possible out of galaxy redshift surveys. Even if we assume that 
there will be no major conceptual adjustments to the current leading model, 
there are at least two fundamental issues on which precision measurements from 
galaxy clustering can play a critical role: the contribution (if any) of gravity 
waves to CMB anisotropy, and the equation of state of dark energy. The first 
can be addressed by a precise comparison between the CMB fluctuation am- 
plitude and the present day amplitude of matter fluctuations. Evidence for or 
against gravity waves would take us much further in understanding the origin 
of density fluctuations, and perhaps even to understanding the mechanism (in- 
flation, colliding branes, ...) that accounts for the size and homogeneity of the 
universe. Galaxy clustering has no sensitivity to the equation of state on its 
own, but the sensitivity of other tests depends crucially on precise knowledge of 
TO , where the combination of galaxy clustering and galaxy-galaxy lensing may 
ultimately provide the best constraints. Precise knowledge of today's fluctuation 
amplitude is also essential to some tests for the equation of state and its time 
dependence (see, e.g., the discussion of Kujat et al. 2001). 

Constraining gravity waves, the dark energy equation of state, and neutrino 
masses are concrete goals that we can set for cosmological applications of galaxy 
clustering. But we should not assume that the simplest model consistent with 
the current data (which already contains at least one very surprising element) 
will remain consistent with improving observations. A break in the inflation- 
ary fluctuation spectrum, a relativistic background inconsistent with standard 
neutrino physics, a baryon density inconsistent with big bang nucleosynthesis, a 
small admixture of non-Gaussian or isocurvature fluctuations — all of these are 
departures from the standard model whose quantitative impact would be subtle 
but whose physical implications would be profound. What we will learn from 
the 2dF and SDSS galaxy surveys depends in large part on what the universe 
has to teach us, and that is something we cannot yet know. Finding out is an 
exciting task for the New Era in Cosmology. 



I am grateful to my numerous colleagues in the SDSS for producing the 
exciting results that I have recapitulated in §2, and for their efforts and progress 
in producing a data set that warrants the theoretical musings in §3. I thank 
my collaborators on the work discussed in §3, especially Andreas Berlind, Zheng 
Zheng, and Jeremy Tinker, whose contributions to the ideas and to the results 
have been central. I thank the NSF for its support of this research program 
and the Institute for Advanced Study and the Ambrose Monell Foundation for 
hospitality and support during the recent phases of this work. More details 
about the SDSS, including links to the Early Data Release, an ever-growing list 
of scientific publications based on the SDSS data, and a list of the many par- 
ticipating institutions and funding agencies that have made the sur vey possible, 
can be found at the official SDSS web site, tittp : //www . sdss . org . 
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